The first two chunks of this r markdown file after the r setup allow for plot zooming, but it also means that the html file must be opened in a browser to view the document properly. When it knits in RStudio the preview will appear empty but the html when opened in a browser will have all the info and you can click on each plot to Zoom in on it.

Before you begin

Notes

A few notes about this script.

If you are running this with the 2022-2023 data make sure you download the whole (OSM_2022-2023 GitHub repository)[https://github.com/ACMElabUvic/OSM_2022-2023] from the ACMElabUvic GitHub. This will ensure you have all the files, data, and proper folder structure you will need to run this code and associated analyses.

Also make sure you open RStudio through the R project (OSM_2022-2023.Rproj) this will automatically set your working directory to the correct place (wherever you saved the repository) and ensure you don’t have to change the file paths for some of the data.

Lastly, if you are looking to adapt this code for a future year of data, you will want to ensure you have run the ACME_camera_script_9-2-2024.R or .Rmd with your data as there is much data formatting, cleaning, and restructuring that has to be done before this code will work.

If you have question please email the most recent author, currently

Marissa A. Dyck
Postdoctoral research fellow
University of Victoria
School of Environmental Studies
Email: marissadyck17@gmail.com

(update/add authors as needed)

Install packages

If you don’t already have the following packages installed, use the code below to install them.

install.packages('tidyverse')
install.packages('PerformanceAnalytics')
install.packages('Hmisc')

Load libraries

Then load the packages to your library.

library(tidyverse) # data tidying, visualization, and much more; this will load all tidyverse packages, can see complete list using tidyverse_packages()
library(PerformanceAnalytics)    #Used to generate a correlation plot
library(Hmisc) # used to generate histograms for all variables in data frame

Read in covariate data

To do any analysis with the detection data from the OSM arrays, we will want to pair it with the covaraite data which has human factors indices (HFI) and landcover data (VEG) for each site. There are a lot of covaraites/features in these datasets that need to be grouped together to be usable, which is what this script covers.

Let’s read in the covariate data (outputs from the ACME_camera_script_9-2-2024.Rmd)

# model covariates (merged HFI and VEG data from the ACME_camera_script_9-2-2024.R or .Rmd)
covariates <- read_csv('data/processed/OSM_2022_covariates.csv',
                       
                       # set the column types to read in correctly
                       col_types = cols(array = col_factor(),
                                        camera = col_factor(),
                                        site = col_factor(),
                                        buff_dist = col_factor(),
                                        .default = col_number()))

# check variable structure
str(covariates)
## spc_tbl_ [3,100 Ă— 119] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ array                       : Factor w/ 4 levels "LU13","LU15",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ camera                      : Factor w/ 96 levels "18","15","03",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ site                        : Factor w/ 155 levels "LU13_18","LU13_15",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ buff_dist                   : Factor w/ 20 levels "250","500","750",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ vegetated_edge_roads        : num [1:3100] 0 0.0858 0 0 0 ...
##  $ harvest_area                : num [1:3100] 0 0 0.687 0.337 0 ...
##  $ road_gravel_1l              : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ conventional_seismic        : num [1:3100] 0 0.03276 0 0.00889 0.01145 ...
##  $ tame_pasture                : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ pipeline                    : num [1:3100] 0 0.068 0 0 0.0301 ...
##  $ road_gravel_2l              : num [1:3100] 0 0 0 0 0 ...
##  $ trail                       : num [1:3100] 0.00588 0.0028 0 0.00196 0 ...
##  $ well_bitumen                : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ rough_pasture               : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ well_aband                  : num [1:3100] 0 0 0 0 0.0322 ...
##  $ road_unclassified           : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ crop                        : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ low_impact_seismic          : num [1:3100] 0 0 0 0 0.0523 ...
##  $ clearing_unknown            : num [1:3100] 0.0923 0.0697 0 0 0 ...
##  $ cultivation_abandoned       : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ road_paved_undiv_2l         : num [1:3100] 0 0.0174 0 0 0 ...
##  $ road_unimproved             : num [1:3100] 0 0 0 0 0 ...
##  $ truck_trail                 : num [1:3100] 0 0 0 0.0139 0 ...
##  $ dugout                      : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ road_paved_undiv_1l         : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ well_gas                    : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ vegetated_edge_railways     : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ harvest_area_white_zone     : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ country_residence           : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ borrowpit_dry               : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ rural_residence             : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ borrowpit_wet               : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ borrowpits                  : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ grvl_sand_pit               : num [1:3100] 0 0.0873 0 0 0 ...
##  $ ris_reclaimed_temp          : num [1:3100] 0 0.0477 0 0 0 ...
##  $ ris_clearing_unknown        : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ ris_drainage                : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ ris_mines_oilsands          : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ ris_overburden_dump         : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ ris_facility_operations     : num [1:3100] 0 0 0 0 0 ...
##  $ transmission_line           : num [1:3100] 0.0642 0 0 0 0.091 ...
##  $ ris_tailing_pond            : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ clearing_wellpad_unconfirmed: num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ mines_oilsands              : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ ris_soil_replaced           : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ road_paved_1l               : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ ris_oilsands_rms            : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ ris_facility_unknown        : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ ris_borrowpits              : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ ris_transmission_line       : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ ris_soil_salvaged           : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ ris_road                    : num [1:3100] 0 0 0 0 0 ...
##  $ ris_plant                   : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ urban_residence             : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ facility_other              : num [1:3100] 0 0 0 0 0 ...
##  $ airp_runway                 : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ runway                      : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ ris_reclaimed_permanent     : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ urban_industrial            : num [1:3100] 0.291 0 0 0 0 ...
##  $ lagoon                      : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ facility_unknown            : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ residence_clearing          : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ well_cased                  : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ road_unpaved_2l             : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ road_paved_3l               : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ surrounding_veg             : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ rlwy_sgl_track              : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ road_winter                 : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ sump                        : num [1:3100] 0 0 0 0 0 ...
##  $ greenspace                  : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ road_paved_2l               : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ well_other                  : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ canal                       : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ reservoir                   : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ well_cleared_not_confirmed  : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ misc_oil_gas_facility       : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ camp_industrial             : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ ris_camp_industrial         : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ oil_gas_plant               : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ well_unknown                : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ ris_utilities               : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ cfo                         : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ recreation                  : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ campground                  : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ peat                        : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ golfcourse                  : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ landfill                    : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ transfer_station            : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ mill                        : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ road_paved_div              : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ rlwy_spur                   : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ well_cleared_not_drilled    : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ open_pit_mine               : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ well_oil                    : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ road_paved_4l               : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ mines_pitlake               : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ ris_reclaimed_certified     : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ ris_windrow                 : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ tailing_pond                : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##   [list output truncated]
##  - attr(*, "spec")=
##   .. cols(
##   ..   .default = col_number(),
##   ..   array = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
##   ..   camera = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
##   ..   site = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
##   ..   buff_dist = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
##   ..   vegetated_edge_roads = col_number(),
##   ..   harvest_area = col_number(),
##   ..   road_gravel_1l = col_number(),
##   ..   conventional_seismic = col_number(),
##   ..   tame_pasture = col_number(),
##   ..   pipeline = col_number(),
##   ..   road_gravel_2l = col_number(),
##   ..   trail = col_number(),
##   ..   well_bitumen = col_number(),
##   ..   rough_pasture = col_number(),
##   ..   well_aband = col_number(),
##   ..   road_unclassified = col_number(),
##   ..   crop = col_number(),
##   ..   low_impact_seismic = col_number(),
##   ..   clearing_unknown = col_number(),
##   ..   cultivation_abandoned = col_number(),
##   ..   road_paved_undiv_2l = col_number(),
##   ..   road_unimproved = col_number(),
##   ..   truck_trail = col_number(),
##   ..   dugout = col_number(),
##   ..   road_paved_undiv_1l = col_number(),
##   ..   well_gas = col_number(),
##   ..   vegetated_edge_railways = col_number(),
##   ..   harvest_area_white_zone = col_number(),
##   ..   country_residence = col_number(),
##   ..   borrowpit_dry = col_number(),
##   ..   rural_residence = col_number(),
##   ..   borrowpit_wet = col_number(),
##   ..   borrowpits = col_number(),
##   ..   grvl_sand_pit = col_number(),
##   ..   ris_reclaimed_temp = col_number(),
##   ..   ris_clearing_unknown = col_number(),
##   ..   ris_drainage = col_number(),
##   ..   ris_mines_oilsands = col_number(),
##   ..   ris_overburden_dump = col_number(),
##   ..   ris_facility_operations = col_number(),
##   ..   transmission_line = col_number(),
##   ..   ris_tailing_pond = col_number(),
##   ..   clearing_wellpad_unconfirmed = col_number(),
##   ..   mines_oilsands = col_number(),
##   ..   ris_soil_replaced = col_number(),
##   ..   road_paved_1l = col_number(),
##   ..   ris_oilsands_rms = col_number(),
##   ..   ris_facility_unknown = col_number(),
##   ..   ris_borrowpits = col_number(),
##   ..   ris_transmission_line = col_number(),
##   ..   ris_soil_salvaged = col_number(),
##   ..   ris_road = col_number(),
##   ..   ris_plant = col_number(),
##   ..   urban_residence = col_number(),
##   ..   facility_other = col_number(),
##   ..   airp_runway = col_number(),
##   ..   runway = col_number(),
##   ..   ris_reclaimed_permanent = col_number(),
##   ..   urban_industrial = col_number(),
##   ..   lagoon = col_number(),
##   ..   facility_unknown = col_number(),
##   ..   residence_clearing = col_number(),
##   ..   well_cased = col_number(),
##   ..   road_unpaved_2l = col_number(),
##   ..   road_paved_3l = col_number(),
##   ..   surrounding_veg = col_number(),
##   ..   rlwy_sgl_track = col_number(),
##   ..   road_winter = col_number(),
##   ..   sump = col_number(),
##   ..   greenspace = col_number(),
##   ..   road_paved_2l = col_number(),
##   ..   well_other = col_number(),
##   ..   canal = col_number(),
##   ..   reservoir = col_number(),
##   ..   well_cleared_not_confirmed = col_number(),
##   ..   misc_oil_gas_facility = col_number(),
##   ..   camp_industrial = col_number(),
##   ..   ris_camp_industrial = col_number(),
##   ..   oil_gas_plant = col_number(),
##   ..   well_unknown = col_number(),
##   ..   ris_utilities = col_number(),
##   ..   cfo = col_number(),
##   ..   recreation = col_number(),
##   ..   campground = col_number(),
##   ..   peat = col_number(),
##   ..   golfcourse = col_number(),
##   ..   landfill = col_number(),
##   ..   transfer_station = col_number(),
##   ..   mill = col_number(),
##   ..   road_paved_div = col_number(),
##   ..   rlwy_spur = col_number(),
##   ..   well_cleared_not_drilled = col_number(),
##   ..   open_pit_mine = col_number(),
##   ..   well_oil = col_number(),
##   ..   road_paved_4l = col_number(),
##   ..   mines_pitlake = col_number(),
##   ..   ris_reclaimed_certified = col_number(),
##   ..   ris_windrow = col_number(),
##   ..   tailing_pond = col_number(),
##   ..   rlwy_mlt_track = col_number(),
##   ..   rlwy_dbl_track = col_number(),
##   ..   ris_waste = col_number(),
##   ..   interchange_ramp = col_number(),
##   ..   road_paved_5l = col_number(),
##   ..   ris_airp_runway = col_number(),
##   ..   fruit_vegetables = col_number(),
##   ..   road_unpaved_1l = col_number(),
##   ..   ris_reclaim_ready = col_number(),
##   ..   ris_tank_farm = col_number(),
##   ..   lc_class20 = col_number(),
##   ..   lc_class32 = col_number(),
##   ..   lc_class33 = col_number(),
##   ..   lc_class34 = col_number(),
##   ..   lc_class50 = col_number(),
##   ..   lc_class110 = col_number(),
##   ..   lc_class120 = col_number(),
##   ..   lc_class210 = col_number(),
##   ..   lc_class220 = col_number(),
##   ..   lc_class230 = col_number()
##   .. )
##  - attr(*, "problems")=<externalptr>

Data exploration

There are too many covariates to include in the models individually and many of them describe similar HFI features.

Now that this section is finalized, we will use the structure outlined in the covariates_table.docx which can be found in the ‘relevant_literature’ folder of this repository for formatting the covariates for this and future related analyses. However, the code below outlines a process to explore the data which led to some of the decisions in the covariates_table.docx in case someone wants to group the data in a different way they have code to explore it

The covariate_table and the README file in this repository include descriptions of each feature from the ABMI human footprints wall to wall data download website for Year 2021; which can also be found in the relevant_literature folder of this repository (HFI_2021_v1_0_Metadata_Final.pdf).

Order data

First lets order the columns alphabetically so we can look at descriptions for everything in the ABMI doc easier. We will want the non-covariate columns (i.e., array, site, camera, buffer_dsit) at the front so we can use relocate after we order all of the columns to move these four to the front of the data.

covariates <- covariates %>% 
  
  # order columns alphabetically
  select(order(colnames(.))) %>% 
  
  # we want to move the columns that aren't HFI features or landcover to the front
  relocate(.,
           c(array,
             site,
             camera,
             buff_dist))  

# get a list of column names to ensure it worked
names(covariates)
##   [1] "array"                        "site"                        
##   [3] "camera"                       "buff_dist"                   
##   [5] "airp_runway"                  "borrowpit_dry"               
##   [7] "borrowpit_wet"                "borrowpits"                  
##   [9] "camp_industrial"              "campground"                  
##  [11] "canal"                        "cfo"                         
##  [13] "clearing_unknown"             "clearing_wellpad_unconfirmed"
##  [15] "conventional_seismic"         "country_residence"           
##  [17] "crop"                         "cultivation_abandoned"       
##  [19] "dugout"                       "facility_other"              
##  [21] "facility_unknown"             "fruit_vegetables"            
##  [23] "golfcourse"                   "greenspace"                  
##  [25] "grvl_sand_pit"                "harvest_area"                
##  [27] "harvest_area_white_zone"      "interchange_ramp"            
##  [29] "lagoon"                       "landfill"                    
##  [31] "lc_class110"                  "lc_class120"                 
##  [33] "lc_class20"                   "lc_class210"                 
##  [35] "lc_class220"                  "lc_class230"                 
##  [37] "lc_class32"                   "lc_class33"                  
##  [39] "lc_class34"                   "lc_class50"                  
##  [41] "low_impact_seismic"           "mill"                        
##  [43] "mines_oilsands"               "mines_pitlake"               
##  [45] "misc_oil_gas_facility"        "oil_gas_plant"               
##  [47] "open_pit_mine"                "peat"                        
##  [49] "pipeline"                     "recreation"                  
##  [51] "reservoir"                    "residence_clearing"          
##  [53] "ris_airp_runway"              "ris_borrowpits"              
##  [55] "ris_camp_industrial"          "ris_clearing_unknown"        
##  [57] "ris_drainage"                 "ris_facility_operations"     
##  [59] "ris_facility_unknown"         "ris_mines_oilsands"          
##  [61] "ris_oilsands_rms"             "ris_overburden_dump"         
##  [63] "ris_plant"                    "ris_reclaim_ready"           
##  [65] "ris_reclaimed_certified"      "ris_reclaimed_permanent"     
##  [67] "ris_reclaimed_temp"           "ris_road"                    
##  [69] "ris_soil_replaced"            "ris_soil_salvaged"           
##  [71] "ris_tailing_pond"             "ris_tank_farm"               
##  [73] "ris_transmission_line"        "ris_utilities"               
##  [75] "ris_waste"                    "ris_windrow"                 
##  [77] "rlwy_dbl_track"               "rlwy_mlt_track"              
##  [79] "rlwy_sgl_track"               "rlwy_spur"                   
##  [81] "road_gravel_1l"               "road_gravel_2l"              
##  [83] "road_paved_1l"                "road_paved_2l"               
##  [85] "road_paved_3l"                "road_paved_4l"               
##  [87] "road_paved_5l"                "road_paved_div"              
##  [89] "road_paved_undiv_1l"          "road_paved_undiv_2l"         
##  [91] "road_unclassified"            "road_unimproved"             
##  [93] "road_unpaved_1l"              "road_unpaved_2l"             
##  [95] "road_winter"                  "rough_pasture"               
##  [97] "runway"                       "rural_residence"             
##  [99] "sump"                         "surrounding_veg"             
## [101] "tailing_pond"                 "tame_pasture"                
## [103] "trail"                        "transfer_station"            
## [105] "transmission_line"            "truck_trail"                 
## [107] "urban_industrial"             "urban_residence"             
## [109] "vegetated_edge_railways"      "vegetated_edge_roads"        
## [111] "well_aband"                   "well_bitumen"                
## [113] "well_cased"                   "well_cleared_not_confirmed"  
## [115] "well_cleared_not_drilled"     "well_gas"                    
## [117] "well_oil"                     "well_other"                  
## [119] "well_unknown"

Summary 1000m

Let’s get a summary of each variable now, and lets filter by just the 1000m buffer width so we don’t have a bunch of repeated data for each buffer width at each site, this will give us general insights into how much variability we have with each feature at a general buffer width. *You can change this if you are interested in a different bufffer width specifically, or if it makes more since to see the data for the min (250m) or max (5000m) buffer width.

covariates %>% 
  # filter to just buffer 1000 m
  filter(buff_dist == 1000) %>% 
  
  summary(.)
##   array         site         camera      buff_dist    airp_runway
##  LU13:41   LU13_18:  1   27     :  4   1000   :155   Min.   :0   
##  LU15:39   LU13_15:  1   32     :  4   250    :  0   1st Qu.:0   
##  LU21:36   LU13_03:  1   41     :  4   500    :  0   Median :0   
##  LU01:39   LU13_34:  1   36     :  4   750    :  0   Mean   :0   
##            LU13_57:  1   16     :  3   1250   :  0   3rd Qu.:0   
##            LU13_16:  1   21     :  3   1500   :  0   Max.   :0   
##            (Other):149   (Other):133   (Other):  0               
##  borrowpit_dry       borrowpit_wet         borrowpits       
##  Min.   :0.0000000   Min.   :0.0000000   Min.   :0.0000000  
##  1st Qu.:0.0000000   1st Qu.:0.0000000   1st Qu.:0.0000000  
##  Median :0.0000000   Median :0.0000000   Median :0.0000000  
##  Mean   :0.0009388   Mean   :0.0006446   Mean   :0.0002542  
##  3rd Qu.:0.0000000   3rd Qu.:0.0000000   3rd Qu.:0.0000000  
##  Max.   :0.0300351   Max.   :0.0198622   Max.   :0.0072821  
##                                                             
##  camp_industrial       campground     canal        cfo    clearing_unknown  
##  Min.   :0.0000000   Min.   :0    Min.   :0   Min.   :0   Min.   :0.000000  
##  1st Qu.:0.0000000   1st Qu.:0    1st Qu.:0   1st Qu.:0   1st Qu.:0.000000  
##  Median :0.0000000   Median :0    Median :0   Median :0   Median :0.000000  
##  Mean   :0.0003785   Mean   :0    Mean   :0   Mean   :0   Mean   :0.006422  
##  3rd Qu.:0.0000000   3rd Qu.:0    3rd Qu.:0   3rd Qu.:0   3rd Qu.:0.001654  
##  Max.   :0.0160772   Max.   :0    Max.   :0   Max.   :0   Max.   :0.182912  
##                                                                             
##  clearing_wellpad_unconfirmed conventional_seismic country_residence  
##  Min.   :0.0000000            Min.   :0.000000     Min.   :0.0000000  
##  1st Qu.:0.0000000            1st Qu.:0.002498     1st Qu.:0.0000000  
##  Median :0.0000000            Median :0.005202     Median :0.0000000  
##  Mean   :0.0004428            Mean   :0.006143     Mean   :0.0000828  
##  3rd Qu.:0.0000000            3rd Qu.:0.009577     3rd Qu.:0.0000000  
##  Max.   :0.0117571            Max.   :0.020381     Max.   :0.0128340  
##                                                                       
##       crop   cultivation_abandoned     dugout  facility_other    
##  Min.   :0   Min.   :0.000e+00     Min.   :0   Min.   :0.000000  
##  1st Qu.:0   1st Qu.:0.000e+00     1st Qu.:0   1st Qu.:0.000000  
##  Median :0   Median :0.000e+00     Median :0   Median :0.000000  
##  Mean   :0   Mean   :5.408e-05     Mean   :0   Mean   :0.001119  
##  3rd Qu.:0   3rd Qu.:0.000e+00     3rd Qu.:0   3rd Qu.:0.000000  
##  Max.   :0   Max.   :8.383e-03     Max.   :0   Max.   :0.062266  
##                                                                  
##  facility_unknown    fruit_vegetables   golfcourse   greenspace
##  Min.   :0.000e+00   Min.   :0        Min.   :0    Min.   :0   
##  1st Qu.:0.000e+00   1st Qu.:0        1st Qu.:0    1st Qu.:0   
##  Median :0.000e+00   Median :0        Median :0    Median :0   
##  Mean   :5.746e-05   Mean   :0        Mean   :0    Mean   :0   
##  3rd Qu.:0.000e+00   3rd Qu.:0        3rd Qu.:0    3rd Qu.:0   
##  Max.   :3.281e-03   Max.   :0        Max.   :0    Max.   :0   
##                                                                
##  grvl_sand_pit       harvest_area     harvest_area_white_zone interchange_ramp
##  Min.   :0.000000   Min.   :0.00000   Min.   :0               Min.   :0       
##  1st Qu.:0.000000   1st Qu.:0.00000   1st Qu.:0               1st Qu.:0       
##  Median :0.000000   Median :0.00000   Median :0               Median :0       
##  Mean   :0.003109   Mean   :0.02293   Mean   :0               Mean   :0       
##  3rd Qu.:0.000000   3rd Qu.:0.00000   3rd Qu.:0               3rd Qu.:0       
##  Max.   :0.109732   Max.   :0.42899   Max.   :0               Max.   :0       
##                                                                               
##      lagoon             landfill  lc_class110        lc_class120       
##  Min.   :0.0000000   Min.   :0   Min.   :0.000000   Min.   :0.000e+00  
##  1st Qu.:0.0000000   1st Qu.:0   1st Qu.:0.004449   1st Qu.:0.000e+00  
##  Median :0.0000000   Median :0   Median :0.046414   Median :0.000e+00  
##  Mean   :0.0002406   Mean   :0   Mean   :0.054946   Mean   :3.878e-06  
##  3rd Qu.:0.0000000   3rd Qu.:0   3rd Qu.:0.082004   3rd Qu.:0.000e+00  
##  Max.   :0.0126573   Max.   :0   Max.   :0.231159   Max.   :6.011e-04  
##                                                                        
##    lc_class20       lc_class210      lc_class220       lc_class230     
##  Min.   :0.00000   Min.   :0.0000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.00000   1st Qu.:0.4659   1st Qu.:0.00000   1st Qu.:0.00000  
##  Median :0.00000   Median :0.7228   Median :0.01425   Median :0.02315  
##  Mean   :0.02123   Mean   :0.6400   Mean   :0.10735   Mean   :0.06363  
##  3rd Qu.:0.00000   3rd Qu.:0.8433   3rd Qu.:0.16066   3rd Qu.:0.08982  
##  Max.   :0.38025   Max.   :0.9858   Max.   :0.84274   Max.   :0.47473  
##                                                                        
##    lc_class32   lc_class33         lc_class34        lc_class50     
##  Min.   :0    Min.   :0.000000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0    1st Qu.:0.000000   1st Qu.:0.00000   1st Qu.:0.01205  
##  Median :0    Median :0.000000   Median :0.00000   Median :0.03874  
##  Mean   :0    Mean   :0.004366   Mean   :0.03862   Mean   :0.06991  
##  3rd Qu.:0    3rd Qu.:0.000000   3rd Qu.:0.05870   3rd Qu.:0.09848  
##  Max.   :0    Max.   :0.243332   Max.   :0.25234   Max.   :0.55986  
##                                                                     
##  low_impact_seismic      mill   mines_oilsands mines_pitlake
##  Min.   :0.000000   Min.   :0   Min.   :0      Min.   :0    
##  1st Qu.:0.000000   1st Qu.:0   1st Qu.:0      1st Qu.:0    
##  Median :0.000000   Median :0   Median :0      Median :0    
##  Mean   :0.004172   Mean   :0   Mean   :0      Mean   :0    
##  3rd Qu.:0.000063   3rd Qu.:0   3rd Qu.:0      3rd Qu.:0    
##  Max.   :0.060391   Max.   :0   Max.   :0      Max.   :0    
##                                                             
##  misc_oil_gas_facility oil_gas_plant      open_pit_mine            peat  
##  Min.   :0.000000      Min.   :0.000000   Min.   :0.0000000   Min.   :0  
##  1st Qu.:0.000000      1st Qu.:0.000000   1st Qu.:0.0000000   1st Qu.:0  
##  Median :0.000000      Median :0.000000   Median :0.0000000   Median :0  
##  Mean   :0.002912      Mean   :0.001167   Mean   :0.0005665   Mean   :0  
##  3rd Qu.:0.000000      3rd Qu.:0.000000   3rd Qu.:0.0000000   3rd Qu.:0  
##  Max.   :0.107208      Max.   :0.071271   Max.   :0.0389603   Max.   :0  
##                                                                          
##     pipeline         recreation   reservoir         residence_clearing
##  Min.   :0.00000   Min.   :0    Min.   :0.000e+00   Min.   :0         
##  1st Qu.:0.00000   1st Qu.:0    1st Qu.:0.000e+00   1st Qu.:0         
##  Median :0.02243   Median :0    Median :0.000e+00   Median :0         
##  Mean   :0.02699   Mean   :0    Mean   :2.865e-05   Mean   :0         
##  3rd Qu.:0.03776   3rd Qu.:0    3rd Qu.:0.000e+00   3rd Qu.:0         
##  Max.   :0.12204   Max.   :0    Max.   :4.441e-03   Max.   :0         
##                                                                       
##  ris_airp_runway ris_borrowpits ris_camp_industrial ris_clearing_unknown
##  Min.   :0       Min.   :0      Min.   :0           Min.   :0           
##  1st Qu.:0       1st Qu.:0      1st Qu.:0           1st Qu.:0           
##  Median :0       Median :0      Median :0           Median :0           
##  Mean   :0       Mean   :0      Mean   :0           Mean   :0           
##  3rd Qu.:0       3rd Qu.:0      3rd Qu.:0           3rd Qu.:0           
##  Max.   :0       Max.   :0      Max.   :0           Max.   :0           
##                                                                         
##   ris_drainage ris_facility_operations ris_facility_unknown ris_mines_oilsands
##  Min.   :0     Min.   :0.0000000       Min.   :0            Min.   :0         
##  1st Qu.:0     1st Qu.:0.0000000       1st Qu.:0            1st Qu.:0         
##  Median :0     Median :0.0000000       Median :0            Median :0         
##  Mean   :0     Mean   :0.0003528       Mean   :0            Mean   :0         
##  3rd Qu.:0     3rd Qu.:0.0000000       3rd Qu.:0            3rd Qu.:0         
##  Max.   :0     Max.   :0.0546781       Max.   :0            Max.   :0         
##                                                                               
##  ris_oilsands_rms ris_overburden_dump   ris_plant ris_reclaim_ready
##  Min.   :0        Min.   :0           Min.   :0   Min.   :0        
##  1st Qu.:0        1st Qu.:0           1st Qu.:0   1st Qu.:0        
##  Median :0        Median :0           Median :0   Median :0        
##  Mean   :0        Mean   :0           Mean   :0   Mean   :0        
##  3rd Qu.:0        3rd Qu.:0           3rd Qu.:0   3rd Qu.:0        
##  Max.   :0        Max.   :0           Max.   :0   Max.   :0        
##                                                                    
##  ris_reclaimed_certified ris_reclaimed_permanent ris_reclaimed_temp
##  Min.   :0               Min.   :0.0000000       Min.   :0.000000  
##  1st Qu.:0               1st Qu.:0.0000000       1st Qu.:0.000000  
##  Median :0               Median :0.0000000       Median :0.000000  
##  Mean   :0               Mean   :0.0002803       Mean   :0.000318  
##  3rd Qu.:0               3rd Qu.:0.0000000       3rd Qu.:0.000000  
##  Max.   :0               Max.   :0.0434483       Max.   :0.016762  
##                                                                    
##     ris_road         ris_soil_replaced ris_soil_salvaged ris_tailing_pond   
##  Min.   :0.0000000   Min.   :0         Min.   :0         Min.   :0.0000000  
##  1st Qu.:0.0000000   1st Qu.:0         1st Qu.:0         1st Qu.:0.0000000  
##  Median :0.0000000   Median :0         Median :0         Median :0.0000000  
##  Mean   :0.0000302   Mean   :0         Mean   :0         Mean   :0.0009116  
##  3rd Qu.:0.0000000   3rd Qu.:0         3rd Qu.:0         3rd Qu.:0.0000000  
##  Max.   :0.0046809   Max.   :0         Max.   :0         Max.   :0.1413014  
##                                                                             
##  ris_tank_farm ris_transmission_line ris_utilities   ris_waste  ris_windrow
##  Min.   :0     Min.   :0             Min.   :0     Min.   :0   Min.   :0   
##  1st Qu.:0     1st Qu.:0             1st Qu.:0     1st Qu.:0   1st Qu.:0   
##  Median :0     Median :0             Median :0     Median :0   Median :0   
##  Mean   :0     Mean   :0             Mean   :0     Mean   :0   Mean   :0   
##  3rd Qu.:0     3rd Qu.:0             3rd Qu.:0     3rd Qu.:0   3rd Qu.:0   
##  Max.   :0     Max.   :0             Max.   :0     Max.   :0   Max.   :0   
##                                                                            
##  rlwy_dbl_track rlwy_mlt_track rlwy_sgl_track   rlwy_spur road_gravel_1l    
##  Min.   :0      Min.   :0      Min.   :0      Min.   :0   Min.   :0.000000  
##  1st Qu.:0      1st Qu.:0      1st Qu.:0      1st Qu.:0   1st Qu.:0.000000  
##  Median :0      Median :0      Median :0      Median :0   Median :0.004254  
##  Mean   :0      Mean   :0      Mean   :0      Mean   :0   Mean   :0.004548  
##  3rd Qu.:0      3rd Qu.:0      3rd Qu.:0      3rd Qu.:0   3rd Qu.:0.007252  
##  Max.   :0      Max.   :0      Max.   :0      Max.   :0   Max.   :0.022773  
##                                                                             
##  road_gravel_2l     road_paved_1l road_paved_2l road_paved_3l road_paved_4l
##  Min.   :0.000000   Min.   :0     Min.   :0     Min.   :0     Min.   :0    
##  1st Qu.:0.000000   1st Qu.:0     1st Qu.:0     1st Qu.:0     1st Qu.:0    
##  Median :0.000000   Median :0     Median :0     Median :0     Median :0    
##  Mean   :0.001748   Mean   :0     Mean   :0     Mean   :0     Mean   :0    
##  3rd Qu.:0.000000   3rd Qu.:0     3rd Qu.:0     3rd Qu.:0     3rd Qu.:0    
##  Max.   :0.015867   Max.   :0     Max.   :0     Max.   :0     Max.   :0    
##                                                                            
##  road_paved_5l road_paved_div road_paved_undiv_1l road_paved_undiv_2l
##  Min.   :0     Min.   :0      Min.   :0.0000000   Min.   :0.0000000  
##  1st Qu.:0     1st Qu.:0      1st Qu.:0.0000000   1st Qu.:0.0000000  
##  Median :0     Median :0      Median :0.0000000   Median :0.0000000  
##  Mean   :0     Mean   :0      Mean   :0.0001162   Mean   :0.0005722  
##  3rd Qu.:0     3rd Qu.:0      3rd Qu.:0.0000000   3rd Qu.:0.0000000  
##  Max.   :0     Max.   :0      Max.   :0.0085401   Max.   :0.0118399  
##                                                                      
##  road_unclassified  road_unimproved    road_unpaved_1l road_unpaved_2l
##  Min.   :0.00e+00   Min.   :0.000000   Min.   :0       Min.   :0      
##  1st Qu.:0.00e+00   1st Qu.:0.000000   1st Qu.:0       1st Qu.:0      
##  Median :0.00e+00   Median :0.000000   Median :0       Median :0      
##  Mean   :2.20e-06   Mean   :0.001069   Mean   :0       Mean   :0      
##  3rd Qu.:0.00e+00   3rd Qu.:0.001017   3rd Qu.:0       3rd Qu.:0      
##  Max.   :3.41e-04   Max.   :0.010709   Max.   :0       Max.   :0      
##                                                                       
##   road_winter rough_pasture           runway          rural_residence    
##  Min.   :0    Min.   :0.0000000   Min.   :0.000e+00   Min.   :0.000e+00  
##  1st Qu.:0    1st Qu.:0.0000000   1st Qu.:0.000e+00   1st Qu.:0.000e+00  
##  Median :0    Median :0.0000000   Median :0.000e+00   Median :0.000e+00  
##  Mean   :0    Mean   :0.0001776   Mean   :9.358e-05   Mean   :5.795e-06  
##  3rd Qu.:0    3rd Qu.:0.0000000   3rd Qu.:0.000e+00   3rd Qu.:0.000e+00  
##  Max.   :0    Max.   :0.0149983   Max.   :1.451e-02   Max.   :8.982e-04  
##                                                                          
##       sump          surrounding_veg  tailing_pond  tame_pasture      
##  Min.   :0.000000   Min.   :0       Min.   :0     Min.   :0.000e+00  
##  1st Qu.:0.000000   1st Qu.:0       1st Qu.:0     1st Qu.:0.000e+00  
##  Median :0.000000   Median :0       Median :0     Median :0.000e+00  
##  Mean   :0.003364   Mean   :0       Mean   :0     Mean   :4.727e-06  
##  3rd Qu.:0.002012   3rd Qu.:0       3rd Qu.:0     3rd Qu.:0.000e+00  
##  Max.   :0.033997   Max.   :0       Max.   :0     Max.   :7.326e-04  
##                                                                      
##      trail           transfer_station transmission_line   truck_trail       
##  Min.   :0.0000000   Min.   :0        Min.   :0.000000   Min.   :0.0000000  
##  1st Qu.:0.0000000   1st Qu.:0        1st Qu.:0.000000   1st Qu.:0.0000000  
##  Median :0.0001657   Median :0        Median :0.000000   Median :0.0000000  
##  Mean   :0.0009478   Mean   :0        Mean   :0.007669   Mean   :0.0008284  
##  3rd Qu.:0.0015165   3rd Qu.:0        3rd Qu.:0.000000   3rd Qu.:0.0000000  
##  Max.   :0.0068343   Max.   :0        Max.   :0.070051   Max.   :0.0149490  
##                                                                             
##  urban_industrial   urban_residence vegetated_edge_railways
##  Min.   :0.000000   Min.   :0       Min.   :0              
##  1st Qu.:0.000000   1st Qu.:0       1st Qu.:0              
##  Median :0.000000   Median :0       Median :0              
##  Mean   :0.002782   Mean   :0       Mean   :0              
##  3rd Qu.:0.000000   3rd Qu.:0       3rd Qu.:0              
##  Max.   :0.215891   Max.   :0       Max.   :0              
##                                                            
##  vegetated_edge_roads   well_aband        well_bitumen        well_cased       
##  Min.   :0.00000      Min.   :0.000000   Min.   :0.000000   Min.   :0.0000000  
##  1st Qu.:0.00379      1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.0000000  
##  Median :0.01016      Median :0.001888   Median :0.000000   Median :0.0000000  
##  Mean   :0.01569      Mean   :0.004932   Mean   :0.009243   Mean   :0.0001615  
##  3rd Qu.:0.02866      3rd Qu.:0.007000   3rd Qu.:0.012968   3rd Qu.:0.0000000  
##  Max.   :0.06275      Max.   :0.042874   Max.   :0.083850   Max.   :0.0071111  
##                                                                                
##  well_cleared_not_confirmed well_cleared_not_drilled    well_gas        
##  Min.   :0.0000000          Min.   :0                Min.   :0.000e+00  
##  1st Qu.:0.0000000          1st Qu.:0                1st Qu.:0.000e+00  
##  Median :0.0000000          Median :0                Median :0.000e+00  
##  Mean   :0.0006285          Mean   :0                Mean   :8.574e-05  
##  3rd Qu.:0.0000000          3rd Qu.:0                3rd Qu.:0.000e+00  
##  Max.   :0.0365581          Max.   :0                Max.   :2.579e-03  
##                                                                         
##     well_oil   well_other        well_unknown
##  Min.   :0   Min.   :0.000000   Min.   :0    
##  1st Qu.:0   1st Qu.:0.000000   1st Qu.:0    
##  Median :0   Median :0.000000   Median :0    
##  Mean   :0   Mean   :0.001517   Mean   :0    
##  3rd Qu.:0   3rd Qu.:0.000000   3rd Qu.:0    
##  Max.   :0   Max.   :0.030134   Max.   :0    
## 

Histograms 1000m

Let’s also plot histograms of each variable for data visualization in a for loop, I wanted to do this for just one buffer size to reduce replicates but it will also drop any variables for which all the data are zeros, so you could explore this at different buffer widths or just remove the filter function and look at all the data which is what I do below once it is grouped

# filter to just one buffer width

covariates_1000 <- covariates %>%  
  
  filter(buff_dist == 1000)

for (col in 1:ncol(covariates_1000)) {
    hist(covariates_1000[,col])
}

Now we can use the information from the previous few steps as well as the variable descriptions from the ABMI human footprints wall to wall data download website for Year 2021 which is stored in the ‘relevant literature’ portion of this document AND also copied into the README file, to group the covariates so we reduce the number of potential variables to explore in the modeling phase.

Format covariates

Group covaraites

We will use the mutate() function with some tidyverse trickery (i.e., nesting across() and contains() in rowsums()) to sum across each observation (row) by searching for various character strings. If there isn’t a common character string for multiple variables we want to sum then we provide each one individually. We can also combine these methods (e.g., with ‘facilities’ [see code]).

covariates_grouped <- covariates %>% 
  
  # rename 'vegetated_edge_roads so that we can use road as keyword to group roads without including this feature
  rename('vegetated_edge_rds' = vegetated_edge_roads) %>% 
  
  # within the mutate function create new column names for the grouped variables
  mutate(
    # borrowpits
    borrowpits = rowSums(across(contains('borrowpit'))) + # here we use rowsums with across() and contains() to sum acrross each row any values for columns that contain the keyword above. Be careful when using that there aren't any variables that match the string (keyword) provided that you don't want to include!
      
      dugout +
      lagoon +
      sump,
    
    
    # clearings
    clearings = rowSums(across(contains('clearing'))) +
      runway,
    
    # cultivations
    cultivation = crop + 
      cultivation_abandoned +
      fruit_vegetables +
      rough_pasture +
      tame_pasture,
    
    # harvest areas
    harvest = rowSums(across(contains('harvest'))),
    
    # industrial facilities
    facilities = rowSums(across(contains('facility'))) +
      rowSums(across(contains('plant'))) +
      camp_industrial +
      mill +
      ris_camp_industrial +
      ris_tank_farm +
      ris_utilities +
      urban_industrial,
    
    # mine areas
    mines = rowSums(across(contains('mine'))) +
      rowSums(across(contains('tailing'))) +
      grvl_sand_pit +
      peat +
      ris_drainage +
      ris_oilsands_rms +
      ris_overburden_dump +
      ris_reclaim_ready +
      ris_soil_salvaged +
      ris_waste,
    
    # railways
    railways = rowSums(across(contains('rlwy'))),
    
    # reclaimed areas
    reclaimed = rowSums(across(contains('reclaimed'))) +
      ris_soil_replaced +
      ris_windrow,
    
    # recreation areas
    recreation = campground +
      golfcourse +
      greenspace +
      recreation,
    
    # residential areas (can't use residence as keyword because 'residence_clearing' is in clearing unless we rearrange groupings or rename that one)
    residential = country_residence +
      rural_residence +
      urban_residence,
    
    # roads (we renamed 'vegetated_edge_roads' above to 'vegetated_edge_rds' so we can use roads as keyword here which saves a bunch of coding as there are many many road variables)
    roads = rowSums(across(contains('road'))) +
      interchange_ramp +
      airp_runway +
      ris_airp_runway +
      transfer_station,
    
    # seismic lines
    seismic_lines = conventional_seismic,
    
    # 3D sesimic lines
    seismic_lines_3D = low_impact_seismic,
    
    # transmission lines
    transmission_lines = rowSums(across(contains('transmission'))),
    
    # trails
    trails = rowSums(across(contains('trail'))),
    
    # vegetated edges
    veg_edges = rowSums(across(contains('vegetated'))) +
      surrounding_veg,
    
    # man-made water features
    water = canal +
      reservoir,
    
    # well sites (this probably includes 'clearing_wellpad' need to check)
    wells = rowSums(across(contains('well'))),
    
    # remove columns that were used to create new columns to tidy the data frame
         .keep = 'unused') %>% 
  
  # reorder variables so the veg data is after all the HFI data
  relocate(starts_with('lc_class'),
           .after = wells)

# see what's left
names(covariates_grouped)
##  [1] "array"              "site"               "camera"            
##  [4] "buff_dist"          "borrowpits"         "cfo"               
##  [7] "landfill"           "pipeline"           "recreation"        
## [10] "clearings"          "cultivation"        "harvest"           
## [13] "facilities"         "mines"              "railways"          
## [16] "reclaimed"          "residential"        "roads"             
## [19] "seismic_lines"      "seismic_lines_3D"   "transmission_lines"
## [22] "trails"             "veg_edges"          "water"             
## [25] "wells"              "lc_class110"        "lc_class120"       
## [28] "lc_class20"         "lc_class210"        "lc_class220"       
## [31] "lc_class230"        "lc_class32"         "lc_class33"        
## [34] "lc_class34"         "lc_class50"
# check the structure of new data
str(covariates_grouped)
## tibble [3,100 Ă— 35] (S3: tbl_df/tbl/data.frame)
##  $ array             : Factor w/ 4 levels "LU13","LU15",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ site              : Factor w/ 155 levels "LU13_18","LU13_15",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ camera            : Factor w/ 96 levels "18","15","03",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ buff_dist         : Factor w/ 20 levels "250","500","750",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ borrowpits        : num [1:3100] 0 0 0 0 0 ...
##  $ cfo               : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ landfill          : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ pipeline          : num [1:3100] 0 0.068 0 0 0.0301 ...
##  $ recreation        : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ clearings         : num [1:3100] 0.0923 0.0697 0 0 0 ...
##  $ cultivation       : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ harvest           : num [1:3100] 0 0 0.687 0.337 0 ...
##  $ facilities        : num [1:3100] 0.291 0 0 0 0 ...
##  $ mines             : num [1:3100] 0 0.0873 0 0 0 ...
##  $ railways          : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ reclaimed         : num [1:3100] 0 0.0477 0 0 0 ...
##  $ residential       : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ roads             : num [1:3100] 0 0.0174 0 0 0 ...
##  $ seismic_lines     : num [1:3100] 0 0.03276 0 0.00889 0.01145 ...
##  $ seismic_lines_3D  : num [1:3100] 0 0 0 0 0.0523 ...
##  $ transmission_lines: num [1:3100] 0.0642 0 0 0 0.091 ...
##  $ trails            : num [1:3100] 0.00588 0.0028 0 0.01591 0 ...
##  $ veg_edges         : num [1:3100] 0 0.0858 0 0 0 ...
##  $ water             : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ wells             : num [1:3100] 0 0 0 0 0.0322 ...
##  $ lc_class110       : num [1:3100] 0.193 0.348 0 0 0.178 ...
##  $ lc_class120       : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ lc_class20        : num [1:3100] 0.0361 0 0 0 0 ...
##  $ lc_class210       : num [1:3100] 0.456 0.358 0.186 1 0.822 ...
##  $ lc_class220       : num [1:3100] 0 0 0 0 0 ...
##  $ lc_class230       : num [1:3100] 0 0.101 0.255 0 0 ...
##  $ lc_class32        : num [1:3100] 0 0 0 0 0 0 0 0 0 0 ...
##  $ lc_class33        : num [1:3100] 0 0.101 0 0 0 ...
##  $ lc_class34        : num [1:3100] 0 0.0916 0 0 0 ...
##  $ lc_class50        : num [1:3100] 0.316 0 0.559 0 0 ...
# check summary of new data
summary(covariates_grouped)
##   array          site          camera       buff_dist      borrowpits      
##  LU13:820   LU13_18:  20   27     :  80   250    : 155   Min.   :0.000000  
##  LU15:780   LU13_15:  20   32     :  80   500    : 155   1st Qu.:0.000000  
##  LU21:720   LU13_03:  20   41     :  80   750    : 155   Median :0.001649  
##  LU01:780   LU13_34:  20   36     :  80   1000   : 155   Mean   :0.004302  
##             LU13_57:  20   16     :  60   1250   : 155   3rd Qu.:0.004453  
##             LU13_16:  20   21     :  60   1500   : 155   Max.   :0.310957  
##             (Other):2980   (Other):2660   (Other):2170                     
##       cfo               landfill    pipeline         recreation       
##  Min.   :0.000e+00   Min.   :0   Min.   :0.00000   Min.   :0.000e+00  
##  1st Qu.:0.000e+00   1st Qu.:0   1st Qu.:0.00000   1st Qu.:0.000e+00  
##  Median :0.000e+00   Median :0   Median :0.01350   Median :0.000e+00  
##  Mean   :8.077e-07   Mean   :0   Mean   :0.01937   Mean   :4.904e-05  
##  3rd Qu.:0.000e+00   3rd Qu.:0   3rd Qu.:0.02812   3rd Qu.:0.000e+00  
##  Max.   :1.215e-03   Max.   :0   Max.   :0.28897   Max.   :1.337e-02  
##                                                                       
##    clearings          cultivation           harvest          facilities      
##  Min.   :0.0000000   Min.   :0.0000000   Min.   :0.00000   Min.   :0.000000  
##  1st Qu.:0.0000000   1st Qu.:0.0000000   1st Qu.:0.00000   1st Qu.:0.000000  
##  Median :0.0005278   Median :0.0000000   Median :0.00000   Median :0.000000  
##  Mean   :0.0060419   Mean   :0.0009397   Mean   :0.01868   Mean   :0.006653  
##  3rd Qu.:0.0040539   3rd Qu.:0.0000000   3rd Qu.:0.01348   3rd Qu.:0.002769  
##  Max.   :0.4024400   Max.   :0.1253361   Max.   :0.83674   Max.   :0.335753  
##                                                                              
##      mines             railways   reclaimed         residential       
##  Min.   :0.000000   Min.   :0   Min.   :0.000000   Min.   :0.0000000  
##  1st Qu.:0.000000   1st Qu.:0   1st Qu.:0.000000   1st Qu.:0.0000000  
##  Median :0.000000   Median :0   Median :0.000000   Median :0.0000000  
##  Mean   :0.005448   Mean   :0   Mean   :0.001002   Mean   :0.0001473  
##  3rd Qu.:0.000000   3rd Qu.:0   3rd Qu.:0.000000   3rd Qu.:0.0000000  
##  Max.   :0.557884   Max.   :0   Max.   :0.078321   Max.   :0.0180541  
##                                                                       
##      roads          seismic_lines      seismic_lines_3D   transmission_lines
##  Min.   :0.000000   Min.   :0.000000   Min.   :0.000000   Min.   :0.000000  
##  1st Qu.:0.001040   1st Qu.:0.002686   1st Qu.:0.000000   1st Qu.:0.000000  
##  Median :0.004019   Median :0.006602   Median :0.000000   Median :0.000000  
##  Mean   :0.006218   Mean   :0.006732   Mean   :0.004302   Mean   :0.005597  
##  3rd Qu.:0.008650   3rd Qu.:0.009985   3rd Qu.:0.001360   3rd Qu.:0.007232  
##  Max.   :0.071829   Max.   :0.045536   Max.   :0.087550   Max.   :0.173909  
##                                                                             
##      trails            veg_edges            water               wells          
##  Min.   :0.000e+00   Min.   :0.000000   Min.   :0.000e+00   Min.   :0.0000000  
##  1st Qu.:9.465e-05   1st Qu.:0.001437   1st Qu.:0.000e+00   1st Qu.:0.0008692  
##  Median :7.187e-04   Median :0.006425   Median :0.000e+00   Median :0.0068416  
##  Mean   :1.516e-03   Mean   :0.011335   Mean   :1.254e-05   Mean   :0.0143883  
##  3rd Qu.:1.958e-03   3rd Qu.:0.015562   3rd Qu.:0.000e+00   3rd Qu.:0.0167246  
##  Max.   :3.864e-02   Max.   :0.147895   Max.   :7.896e-03   Max.   :0.3045854  
##                                                                                
##   lc_class110       lc_class120          lc_class20       lc_class210    
##  Min.   :0.00000   Min.   :0.0000000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.:0.01970   1st Qu.:0.0000000   1st Qu.:0.00000   1st Qu.:0.4607  
##  Median :0.03874   Median :0.0000000   Median :0.00000   Median :0.6749  
##  Mean   :0.04838   Mean   :0.0007554   Mean   :0.02741   Mean   :0.6324  
##  3rd Qu.:0.06218   3rd Qu.:0.0000000   3rd Qu.:0.03361   3rd Qu.:0.8364  
##  Max.   :0.73192   Max.   :0.1211446   Max.   :0.51965   Max.   :1.0000  
##                                                                          
##   lc_class220        lc_class230        lc_class32          lc_class33       
##  Min.   :0.000000   Min.   :0.00000   Min.   :0.000e+00   Min.   :0.0000000  
##  1st Qu.:0.002332   1st Qu.:0.01218   1st Qu.:0.000e+00   1st Qu.:0.0000000  
##  Median :0.044977   Median :0.03595   Median :0.000e+00   Median :0.0000000  
##  Mean   :0.113317   Mean   :0.06341   Mean   :1.748e-05   Mean   :0.0046114  
##  3rd Qu.:0.154669   3rd Qu.:0.08419   3rd Qu.:0.000e+00   3rd Qu.:0.0005702  
##  Max.   :0.971773   Max.   :0.72101   Max.   :1.175e-02   Max.   :0.3242328  
##                                                                              
##    lc_class34         lc_class50     
##  Min.   :0.000000   Min.   :0.00000  
##  1st Qu.:0.000000   1st Qu.:0.02385  
##  Median :0.004043   Median :0.05717  
##  Mean   :0.030311   Mean   :0.07933  
##  3rd Qu.:0.038149   3rd Qu.:0.11545  
##  Max.   :0.557178   Max.   :0.60824  
## 
# there are some NAs in the data which will cause problems with modeling/visualization of data ignore for now but will explore these sites specifically after report

covariates_grouped <- covariates_grouped %>% 
  
  # remove rows with NAs
  na.omit()

Grouped histograms

Let’s look at the histograms again and see if we need to remove any features or feature groups without enough data

# use for loop to plot histograms for all covariates

for (col in 5:ncol(covariates_grouped)) {
    hist(covariates_grouped[,col])
}

> IMO we don’t have enough variation in data to use the following features/feature groups

  • cfo
  • Cultivation
  • Reclaimed
  • Recreation
  • Reservoir
  • Residential
  • Water
  • lc_class_20 (aka water)
  • lc_class120 (aka agriculture)
  • lc_class32 (aka rocks and rubble)
  • lc_class33 (aka exposed land)

We also don’t have any data for following features since they don’t plot with the hist() function

  • Landfill
  • railways

Also, there’s not a lot of data for the following features, which are similar and of interest to OSM, so in the past they’ve been grouped together and we will here as well

  • Borrowpits
  • Clearings
  • Facilities
  • Mines

Format covariates further

So let’s modify this data and remove those features for now this step will need to be changed each year likely

Let’s also rename the landcover classes so they make more sense without having to look them up by number (maybe should add this to script earlier for next year)

covariates_grouped <- covariates_grouped %>% 
  
  # create column osm_industrial
  mutate(
    osm_industrial = borrowpits +
    clearings +
    facilities +
    mines,
    
    # remove columns we used to make this variable
    .keep = 'unused') %>% 
  
  # remove other features we don't need
  select(!c(cfo,
            cultivation,
            reclaimed,
            recreation,
            residential,
            water,
            lc_class20,
            lc_class120,
            lc_class32,
            lc_class33,
            landfill,
            railways)) %>%
  
  # rename landcover classes
  rename(
    grassland = lc_class110,
    coniferous = lc_class210,
    broadleaf = lc_class220,
    mixed = lc_class230,
    developed = lc_class34,
    shrub = lc_class50) 

# check that it worked
names(covariates_grouped)
##  [1] "array"              "site"               "camera"            
##  [4] "buff_dist"          "pipeline"           "harvest"           
##  [7] "roads"              "seismic_lines"      "seismic_lines_3D"  
## [10] "transmission_lines" "trails"             "veg_edges"         
## [13] "wells"              "grassland"          "coniferous"        
## [16] "broadleaf"          "mixed"              "developed"         
## [19] "shrub"              "osm_industrial"

Subset data by buffer

We need to subset the data so we have separate data frames for each buffer width to work with in the analysis AND to explore correlation between variables at each buffer width, as these may very with spatial scales

Let’s use a for loop to subset the data

buffer_frames <- list()

for (i in unique(covariates_grouped$buff_dist)){
  
  print(i)
  
  # Subset data based on radius
  df <- covariates_grouped %>%
    filter(buff_dist == i)
  
  # list of dataframes
  buffer_frames <-c (buffer_frames, list(df))
}
## [1] "250"
## [1] "500"
## [1] "750"
## [1] "1000"
## [1] "1250"
## [1] "1500"
## [1] "1750"
## [1] "2000"
## [1] "2250"
## [1] "2500"
## [1] "2750"
## [1] "3000"
## [1] "3250"
## [1] "3500"
## [1] "3750"
## [1] "4000"
## [1] "4250"
## [1] "4500"
## [1] "4750"
## [1] "5000"
# name list objects so we can extract names for plotting 

buffer_frames <- buffer_frames %>% 
  
  # absurdly long way to do this but for sake of time fuck it
  purrr::set_names('250 meter buffer',
                   '500 meter buffer',
                   '750 meter buffer',
                   '1000 meter buffer',
                   '1250 meter buffer',
                   '1500 meter buffer',
                   '1750 meter buffer',
                   '2000 meter buffer',
                   '2250 meter buffer',
                   '2500 meter buffer',
                   '2750 meter buffer',
                   '3000 meter buffer',
                   '3250 meter buffer',
                   '3500 meter buffer',
                   '3750 meter buffer',
                   '4000 meter buffer',
                   '4250 meter buffer',
                   '4500 meter buffer',
                   '4750 meter buffer',
                   '5000 meter buffer')

Now we have a list with data frames for each buffer width which we can work with later.

Autocorellation

Correlation plots

Now we need to make correlation plots for each buffer width to see what variables are correlated at a given spatial scale. We can use purrr::map() with the chart.Correlation() function from the PerformanceAnalytics package to make correlation plots with a specified method (e.g., pearson, spearman, etc.) That also show histograms and scatterplots of each variable.

correlation_plots <- buffer_frames %>% 
  
  purrr::map(
    ~.x %>% 
      
      # select numeric variables only since we can't compute a r2 for non-numeric
      select_if(is.numeric) %>% 
      
      # use chart.correlation in
      chart.Correlation(.,
                        histogram = TRUE, 
                        method = "pearson")
  )

Correlation table

250m

There is a section for each buffer width outlining the variables that are autocorrelated and thus should not be included in the same model, it includes the r2 as well

  • roads & LC

500m

Exploratory plots

add more to this section in later when we have more time to explore the covariates and choose which should be inlcuded etc.

# use this code to change figure margins otherwise will not plot because figure margines are too large
par(mar=c(1,1,1,1))

# now use purrr to plot histograms for all remaining HFI variables for each buffer
hfi_histograms <- buffer_frames %>% 
  
  purrr::imap(
    ~.x %>% 
      
      # filter to just the HFI variables 
      select(where(is.numeric) &
          ! starts_with('lc_class')) %>% 
      
      # pipe into hist.data.frame function to make histograms for each variable
      hist.data.frame(mtitl = paste0('Histograms of HFI variables at ', .y)))

Now let’s do the same thing with the landcover variables

lc_histograms <- buffer_frames %>% 
  
  purrr::imap(
    ~.x %>% 
      
      # filter to just the landcover variables 
      select(where(is.numeric) &
          starts_with('lc_class')) %>% 
      
      # pipe into hist.data.frame function to make histograms for each variable
      hist.data.frame(mtitl = paste0('Histograms of landcover variables at ', .y)))